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PREFACE 


The Agriculture and Resources Inventory Surveys Through Aerospace Remote 
Sensing is a multiyear program of research, development, evaluation, and appli- 
cation of aerospace remote sensing for agricultural resources, which began in 
fiscal year 1980. This program is a cooperative effort of the U.S. Department 
of Agriculture, the National Aeronautics and Space Administration, the National 
Oceanic and Atmospheric Administration (U.S. Department of Commerce) , the 
Agency for International Development (U.S. Department of State), and the 
U.S. Department of the Inteixor. 

The work which is the subject of this document was performed within the Earth 
Resources Research Division, Space and Life Sciences Directorate, at the 
Lyndon B. Johnson Space Center, National Aeronautics and Space Administration. 
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EVALUATION OF TRENDS IN WHEAT YIELD MODELS 


INTRODUCTION : 

The CCEA models for wheat yield in the U.S. Great Plains (Refs. 1 and 2) 
are multiple regression equations using year numbers and selected weather vari- 
ables as independent variables The models assume that the yield can be repre- 
sented by an almost exact linear model with one or more trends. For each year 
the yield may be expressed 

n 

Y - C + I b X (1) 

i=l 1 1 

where Y is the yield, C is a constant, b.^ is the coefficient of the independent 
variable, X^, and n is the number of independent variables. Figure 1 shows the 
spring and winter wheat areas to which the models apply. Figures 2 and 3 are 
topical curves, for spring and winter wheat, showing the measured yields and 
the trend lines of the regression models, about which the yields vary. Values 
used in plotting the curves were taken from References 1 and 2. The constant 
plotted J.n each case is the sum of the indicated constant and trend coefficients; 
variables used in the calculations to represent the trends have values of one 
until the year of the slope change, then increase by one each year until the end 
of that trend, at which time they become constant. 

The trend lines, alone, do not appear to provide a good fit to the observed 
values of yield. Marquina (3) discusses the use of multiple regression in the 
prediction of yield as a function of meteorological variables. Correlation 
among the independent variables causes multicollinearities which can contribute 
to misleading results. The coefficients associated with the variables may be 
too large, and the signs may be incorrect; adding one or more new observations 
may change the size and the sign of one or more of the coefficients. Marquina 
discusses techniques which may be used to find a regression model when there 
are a large number of independent variables and linear relations (multicolline- 
arities) exist among them. The method of leaps and bounds is used for finding 
the subset of variables constituting the best regression. Principal components 
and latent root regression ar sed for eliminating multicollinearities. And, 
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Klgure l , - Spring and winter wheat growing area* In the U.S. ilreat Plain*. 


MAXIMUM VALUE OF ADJUSTED N 2 



Figure 2.- Trend lines for the CCEA models for spring whest 
yield In the Red River Valley. 



Figure 3.- Trend lines for the CC2A aodels for winter wheat 

yield in Colorado. 



generalized ridge regression is used for introducing bias to provide stability 
in the data matrix. 


Techniques discussed by Marqulna (3) are implemented in programs SELECT and 
BEIRA, written by Marqulna and modified by Johnson Space Center personnel. Pro- 
gram SELECT uses the method of leaps and bounds, described by Furaival and 
Wilson (4). Binary trees of all possible regressions are searched, and tests 
made, to identify the best regression for each subset size without computing 
all possible regressions. Program BEIRA performs the regression and provides 
the information for using the techniques of principal components and latent root 
regression to eliminate multicollinearities . Bias may be introduced by changing 
one or more of the eigenvalues of the correlation matrix of the independent 
variables. 

Each program allows the user to choose the number of trends (one, two or 
none), the year of the slope change, and either of two models, "dependent" or 
"independent." In the dependent model, the yield varies about a line which is 
piecewise continuous, with the slope changing at the specified year. In the 
independent model, the line about which the yield varies is discontinuous at 
the year of the slope change. 

Models were developed using programs SELECT and BEIRA. and the data from 
which the CCEA models were derive^ for the wheat growing areas of Figure 1. The 
trend lines In these curves appear to provide a better fit to the measured yields 
than the trend lines of the CCEA models. Table 1 lists the values of the multiple 
correlation coefficients, R 2 , and residual variances, S 2 , for the CCEA models, 
and for those developed using SELECT and BEIRA, for the five spring wheat and 
nine winter wheat growing areas. Values of R 2 and S 2 , for the SELECT-BEIRA 
models are underlined to indicate their comparison with the CCEA models, A 
double underline indicates that a value of R 2 is greater, or a value of S 2 is 
smaller, than the corresponding values for both CCEA models for the same area. 

A single underline indicates that R 2 is larger, or S 2 smaller, than the corre- 
sponding value for one CCEA model. It is seen that R 2 and S 2 are both under- 
lined at least once for 22 of the 28 SELECT-BEIRA models. Of these R 2 and S 2 
are both underlined twice for 17 models, and R 2 and S 2 for both dependent and 
independent models are underlined twin . for 6 wheat growing areas. It is 
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TABLE 1 . R 2 AND S 2 FOR THE CCEA MODELS AND THOSE 
DEVELOPED USING SELECT AND BEIRA 


MODE IS DEVELOPED 

CCEA MODELS USING SELECT - BEIRA 




PHASE III 

TY 

Dep. 

Ind. 

SPRING WHEAT 






Minnesota 

R 2 

.940 

.859 

.940 

.954* 


S 2 

2.030* 

4.798 

2.344 

2.017 

Montana 

R 2 

.855 

.821 

.925 

.937* 


S 2 

2.413 

3.254 

1.-563 

1.4D7* 

North Dakota 

R 2 

.883 

.878 

.875 

,897* 


S 2 

3.822 

3.604 

3.701 

3.142* 

Fed River Valley 

R 2 

.877 

.870 

-91 5 

.964* 


S 2 

4.108 

4.154 

2.987 

1.314* 

South Dakota 

R 2 

.897* 

.843 

.827 

.882 


S 2 

2.099* 

3.029 

3.339 

2.807 

WINTER WHEAT 






Badlands 

R 2 

.784 

.754 

.822* 

.822* 


S 2 

8.689 

9.954 

7.79? 

7.578* 

Colorado 

R 2 

.775 

.781 

.863 

.872* 


S 2 

4.251 

3.938 

3.169 

2.855* 

Kansas 

R 2 

.915 

.928 

.932 

.944* 


S 2 

2.750 

2.139 

2.372 

ZJ} 22* 

Montana 

R 2 

.835 

.824 

.856 

r? 67 * 


S 2 

3.577 

3.606 


2.956* 

Nebraska 

R 2 

.901 

.884 

.872 

.943* 


S 2 

4.325* 

5.064 

5.772 

2.885 

Oklahoma 

R 2 

.900 

.868 

.890 

.946* 


S 2 

2.273 

2.901 

3.145 

1.517* 

Texas Edwards 

R 2 

.831* 

.760 

.786 

.809 

Plateau 

S 2 

1.670* 

1.905 

1.908 

1.701 

Texas 

R 2 

.837 

.847 

. .863 

.884* 

Low Plains 

S 2 

1.607 

1.590 

1.386 

1*214* 

Texas-Oklahoma 

R 2 

.890 

.884 

.877 

a 9Q3* 

Panhandle 

S 2 

2.643* 

3.008 

3.301 

2.771 

=== R 2 larger (S 2 

smaller) 

than both CCEA models. 



R 2 larger (S 2 

smaller) 

than one CCEA model. 



* Smallest S 2 (largest R 2 ) 

of the four 

models for an 

area. 
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apparent that wheat yield models can be developed having larger values of R 2 
and smaller values of S 2 than the CCEA models for Phase III and TY of LACIE. 

Programs SELECT and BEIRA and the techniques available vith BEIRA are 
described briefly in Appendix A. 


METHOD : 

Programs SELECT and BEIRA were used with yield and weather data, for the 
spring and winter wheat areas shown in Figure 1, for the years 1932 through 
1976. SELECT was used to find the year of slope change and subset of independ- 
ent variables with the largest value of adjusted R 2 , given by 



1 - [ 1 - R 2 ] (~) 
n— m 


( 2 ) 


where R 2 is the coefficient of determination, 

m is the number of variables included in the regression, and 
n is the number of observations. 

Subsets of Independent variables and slope change years identified using SELECT 
were used as input to BEIRA to run the regressions . Programs SELECT and BEIRA 
are described briefly in Appendix A. Independent variables used as input to 
SELECT are listed in Table B-l in Appendix B. 

Program SELECT was run for different slope-change years, for each wheat 

growing area and model (dependent, with a piecewise continuous trend line, and 

independent, with a discontinuous trend line), until there were enough points 

to allow identification of the slope-change year yielding a regression with the 

largest value of R' . Results are shown in Figures B-l through B-14 in 
adj A 

Appendix R. Variables Included in the regressions with peak values of R'jj are 
listed on the plots. 

Regressions were run, using program BEIRA, for the slope change years with 
maximum values of R*^ ‘-in the plots and the indicated independent variables. 

The process of finding a "best regression" for a wheat growing model and area 
is described in Appendix C. In the course of running the regressions it was 
found for several areas that, after the mult icoll inear it ies were removed, the 
values of R‘ were smaller than those for the CCEA models. hire regressions were 


b 



run for the slope -change years indicated by the secondary peaks in the plots of 

Figures B-l through B-14 in Appendix B. When values of P 2 for two or more 

adj 

slope-change years were nearly equal, regressions were run for e*ch of them. 


For comparison with the CCEA models, regiessions were run, using the vari- 
ables chosen by SELECT, for the slope-change years of the CCEA models. For most 
of the winter wheat areas, the CCEA models have more than one slope change. Re- 
gressions were run for each slope-change year except when the slope changed 
after 1970 and the yield values didn't appear to level off. Variables chosen 
by SELECT for each slope-change year and model (dependent and Independent) are 
given in Table B-2 in Appendix B. 


RESULTS : 

Table 2 lists the best regressions for the slope-change years determined 
using SELECT and for the slope-change years of the CCEA models. Quantities 

given for each i egression include R" , the coefficient of multiple determination 

<■> 

which indicates goodness of fit, S" , the residual variance, n^, the difference 
between the number of predicted values greater than and less than the measured 
values, the number of variables in the regression, the variables deleted from 
the subset chosen by SELECT, and whether bias was introduced by increasing the 
value of an eigenvalue. For each wheat area regressions for the slope-change 
years from SELECT are followed by those for the CCEA slope-change years. When 
there are regressions for more than one of the slope-change years from SELECT, 
the one chosen as the best is underlined. 

The best regressions were chosen taking into consideration the values of R 2 , 
S* , the distribution of the predicted values above and below the measured values 
as indicated by n the uniformity of the eigenvalues of R (the matrix of corre- 

4-i 

lation coefficients of the independent variables), and values of |r| and tr(R *), 
which indicate stability of the data matrix. The values of R 2 and S‘ for the 
CCEA slope-change years are underlined to indicate the comparison with those of 
the CCEA models. Two underlines indicates a value of R‘ larger, or a value of 
S* smaller, than those for both CCEA models. One underline indicates a value of 
R‘ greater, or a value of S* smaller, than the value for one of the CCEA models. 
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The trend line curve* ere found to take seven different forme, depending on 
the value* of the two trend coefficients, B T1 and S T2 » Che relation between the 
ccefflclent of the constant variable, 8 » and the quantity B-, (T -T ) where T 1* 

C Tl C O C 

the year of the slope change and T q Is the first year for which data are given. 

The models with trend lines of each form are listed In Table 3 by kind of model 
(dependent or Independent), wheat growing area (S or W to Indicate spring or 
winter wheat), and slope-change year. Figures 4 and 5 show the forma of the 
models and the geographical distribution of the different types. 

Plots of the yield values and the trend lines for the 28 models are given In 
Figures 6 through 19, on the same scale as Figures 2 and 3. Computer-generated 
plots, which show the predicted and measured values, are given In Appendix D. 
Regression coefficients are listed in Table 4. Plots of the CCEA models, other 
than those shown in Figures 2 and 3, are given in Appendix E, for comparison; a 
list of variables and regression coefficients follows the plots. 

C (UCUJSI CNS : 

It. is concluded, as stated In the Introduction, that yield models can be 
developed that have larger values of R 2 and smaller values of S 2 than the CCEA 
models of References 1 and 2. And, that programs SELECT and BEIRA provide an 
objective method for developing yield models based on the variables which best 
describe the yield of a given area. It is evident from Figures 4 and 3 that a 
model of one form will not serve to describe the wheat yield for all of the wheat 
growing areas in the U.S. Great Plains. Nor will one slope-change year provide 
a good description of the change in wheat yield with time. Slope-change years 
for the "best fit" dependent models varied between 1934 end ,,( *65, except for the 
one with slope change in 1975. The slope-change years for independent models 
ranged from 1943 to 1969, and 1975. Table 1 shows that the independent models, in 
which the trend lines are discontinuous at the slope-change year, have consistently 
larger values of R 2 and smaller values of S 2 than the dependent models, with trend 
lines that are piecewise continuous. 

This leads to the conclusion that a model with a "ump" in yield, or with 
three trends, may provide the best description of yie'A as a function of time. 
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TABLE 3: TYPES OF WHEAT YIELD MODELS 


TYPE 1. 


TYPE 2. 


A constant followed by an Increasing trend: 3-^*0, B T2 >0. Eight of 
the 14 dependent models, and 2 of the 14 independent models have this 
form. 

Dependent: Minnesota (S) , 1953 

North Dakota (S), 1953 

Red River Valley (S), 1951 

South Dakota (S) , 1953 

Montana (W) , 1934 

Nebraska (W) , 1938 

Texas Edwards Plateau (W) , 1949 

Texas low Plains (W) , 1948 

Independent: For both these models $ c > 0; the trend lines describe a 

wheat yield which remains constant until the slope-change year then 
increases from a yield value higher than the previous constant value. 
Red River Valley (S) , 1955 
Texas L"»v Plains (W) , 1956 

Two Increasing trends: B^^O, ®T2 > ^‘ ^our dependent models and four 

independent models are of this form. 

Dependent: Montana (S), 1959 

Colorado (W) , 1965 
Kansas (W), 1955 

Texas-Oklahoma Panhandle (W) , 1949 

Independent: B^O, the second trend starts at the overall constant, 

the same yield value as at the start of the first trend. 

Kansas (W) , 1946 

B < B_, (T -T ), the second trend starts ac a lower yield value than 
c Tl c o 

the end of the first trend. 

Montana (S), 1943 
Montana (W) , 1944 

Texas-Oklahoma Panhandle (W) , 1946 


13 


TABLE 3 (continued) 


TYPE 3. to increasing trend followed by a constant 6^, > C , 

Dependent : Oklahoma (W) , 1963 

Independent: $ c > ®t1^ T c ” T o^ * 71,6 second trend starts at a higher 

yield value than the end of the first trend. 

North Dakota (S) , 1961 
Oklahoma (W) , 1958 

TYPE 4. An Increasing trend followed by a decreasing trend. 8^*0, ®T2 < ^‘ 

Independent: 8 > 8_. (T -T ). The second trend starts at a higher 

r c T1 c o 

yield value than the end of the first trend. 

South Dakota (*>) , 1966 
Nebraska (W), 1969 

TYPE 5. A decreasing trend followed by an increasing trend: ^Tl <0 * g T2 >0 * 

Independent: -‘.cause of the initial decreasing trend both of these 

models have a second trend which starts at a yi*>ld value higher than 
the end of the first trend. 

Minnesota (S), 1954, B c >0. 

Colorado (W) , 1944, 

TYPE 6. A constant value followed by a constant value: 6^*0, ^c^* 

Independent : Texas Edwards Plateau (W/ , 1957 . 

TYPE 7. An Increasing trend until 1975 followed by a low value in 1976. The 

trend lines for the two models are almost identical: "^e dependent 

model has a large negative slope from 1975 to 1976, The independent 

model has 8*0: the value of the trend line curve for 1976 is 

T2 c 

the same as the overall constant. 

Dependent: Badlands (W) , 1975. 

Independent: Badlands (W) , 1975. 
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TABLE 4.- COEFFICIENTS FOR BEST FIT REGRESSIONS 
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TABLE 4.- Continued. 

(b) Winter wheat 

(Numbers in parentheses are variable numbers.] 


*1 
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Figure 4.- Geographical distribution of the four types of dependent models. 
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Figure 6.** Tread Lines for models from SELECT-BEIRA of spring wheat 

yield in Minnesota. 
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Figure 7.- Trend lines for models from SH.ECT-BEIRA of spring wheat 

yield in Montana. 
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Figure 8.- Trend lines for models from SO.ECT-BE1RA of spring wheat 

yield in North Dakota. 
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Figure 9.- Trend lines for models from SELECT-BEIRA of spring wheat 
yield in the Red River Valley. 
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Figure 10. - Trend lines for models from SELECT-BEIRA of spring wheat 

yield in South Dakota. 
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Figure 11.- Trend lines for models from SELECT-BEIRA of winter wheat 

yield in the Badlands. 
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Figure 14.- Tread lines of models from SELECT-BEIRA of winter wheat 

yield in Montana. 
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Figure 15.- Trend lines for models from SEL.ECT-BEIRA of winter wheat 

yield in Nebraska. 



Figure 16.- Trend lines for models froa SELECT-BEIRA of winter wheat 

yield in Oklahoma. 



Figure 17.- Trend lines for models from SQ.ECT-BEIRA of winter wheat 
yield in the Texas Edwards Plateau. 
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PROGRAMS SELECT AND BEIRA AND TECHNIQUES FOR 
ELIMINATING MULTICOLLINEARITIES 


APPENDIX A 


PROGRAMS SELECT AND BEIRA, AND TECHNIQUES FOR 
ELIMINATING MULT ICOLLINEARITIES 

Programs SELECT and BEIRA were written by Marquina (3) and modified by JSC 
personnel. Both programs use the same input variables: in the present analysis, 

the year, selected meteorological variables, and the yield. Both programs per- 
form calculations for a multiple linear regression varying about either a constant 
followed by an upward trend, or two upward trends. The year of the slope change 
is specified by the user. SELECT finds the best regressions for subsets of each 
size from one through the number of independent variables. BEIRA performs the 
regressions and provides information for use in identifying and removing muiti- 
collinearities. 

The yield for a given year may be expressed 

y - c + b t1 * t1 + e T2 * T2 + & c * c + b^! + e <A ' l) 

where y is the yield, 

C is an overall constant, 

$ T1 and $ T2 are the coefficients for the two trend variables, and x^, 

8 is the coefficient for the constant variable, x , 
c c 

m is the number of independent variables in addition to the trend and con- 
stant variables, and 

8^ is the coefficient of the ith variable, x^. 

e is the random error. 

Trend and constant variables generated by the program are illustrated in the 
diagram. Both trend variables start at zero and increase by one each year, 
x^ increases from the beginning of the data set. x^ is zero through the slope 
change year, then increases by one each year. In the dependent model x^ remains 
constant after the year of the slope change; the constant variable is not used. 

In the independent model x^ becomes zero after the slope change; x^ is zero 
through the slope change year and one afterwards. In the models with one trend, 
x becomes the constant variable for the independent model, and takes the form 
of x, in the diagram, x^, the trend variable, is 1 before the slope change for 
for the dependent model. For the independent model x^ is as shown, x^ is not 
used. 
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TREND AND CONSTANT VARIABLES FOR SELECT AND BEIRA 


Program SELECT : 

SELECT uses the method of leaps and bounds described by Fumival and Wilson (4). 
For each case SELECT finds the best regression for subsets of each size from one 
through the number of independent variables. Three options are available as criteria 
for determining the best regression: R 2 , adjusted R 2 , and Mallow’s C^. The adjusted 

R 2 used in this study, is given by 

F - 1 - [1 - R 2 ] [ — — J (A-2) 

m n - m 


where m is the number of variables included in the regression, n is the number of 
observations and R 2 is given by 


2 (y t - y ) 2 
E (y t - y ) 2 


(A-3) 


y^ is the predicted yield for a given year, y^ is the measured yield and y is the 
mean. 


Program BEIRA and Ordinary Least Squares : 

Program BEIRA performs the regressions using the method of ordinary least 
squares. The yield may be described as a function of the independent variables, 

P 

Y = 3 + £ 3.x + e (A-4) 

° i-i 1 1 

where Y is the yield, 6 q is a constant, the 6^'s are the coefficients of the p 
independent variables, x^, and £ is the random error. Using vector notation the 
model for yield may be written 
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Y - 6 + X6 + e (A-5) 

o 

where Y is an nxl vector of yield measurements, B Q is a pxl vector of equal 
constants, X is an n x p matrix of measurements of independent variables, B is a 
px vector of regression coefficients and e is an n x 1 vector of random errors. 
The least squares estimate is found by minimizing e'e with respect to B where 
is the transpose of e. The solution of the resulting expression 


X'X8 - X'Y 

is given by 

b - (X'X) -1 X'Y (A-6) 

where b is the vector of estimates of the true values of the B’s. The matrix 
X^X is the matrix of the simple correlation coefficients of the independent vari- 
ables. 


In performing the regression, program BEIRA first standardizes the variables 
by calculating 


X* 

ij 




— 

n 

Z 

N 1=1 


<x ij-y 2 


(A- 7 ) 


Then the regression coefficients are calculated. The resulting regression equa- 
tion describes yield as a function of the deviations of the independent variables 
from their means, except those representing the constant and trends. For the 
case with two trends it may be written 


e + b x 4- b_,x_, + 
i c c T1 T1 


b T2 X T2 




- *.) 
j 


(A-8) 
1, • . . ,n 


b c , b^.^, and b^ are the coefficients of the constant and trend variables x^, 
X Tl’ an< * X T2* P is tbe number independent variables, and n is the number of 
observations. 


The constant, C, is evaluated by substituting the average values of the yield 
and independent variables into equation A-8. Since the average of the deviations 
from the means is zero, the constant becomes 


c a y ’ ^ b Tl x Tl b T 2 X T 2 b c X c^ 


(A- 9 ) 
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For the standardized variables 
n n 

Ex* * 0 and Ex* 2 » 1 for j » 1, 2, . .., p 
i-1 i-1 y 

R 2 -goodness of fit for each regression is calculated using the vector expression 


b'X^Y 

Y'Y 


<A-10) 


The residual variance, S 2 , is calculated from 


,2 Y'Y-b'X'Y 

j 3 

n-p 


(A-ll) 


Predicted values of the yield for each yea- ai^ calculated using equation A-8. 

A 

Differences between the predicted and observed yield, A*y^-y^, are calculated 
for each observation. Plot-. are generated showing the observed and 

predicted values and the contributions of the constant and trend tern to the 
regression equation. 

BEIRA also prints out the correlation matrix for the independent and dependent 
variables, the LRR eigenvalues and eigenvectors, and the eigenvalues of the 
correlation matrix of the independent variables. 


Principal Components Regression : 

In principal components regression the dependent variables are transformed 
to their principal components by multiplying the data matrix, X, by the matrix 
of eigenvectors of X'X, the correlation matrix of the independent variables. The 
transformation is given by 

Z = XS 

where S is an orthogonal matrix whose columns are the eigenvectors of X'X. Then 
the columns of Z are the principal components of X. The regression equation is 
now a function of the principal components of X rather than of the original inde- 
pendent variables. The regression coefficients are given by 

g = (Z "Z) _1 Z ^Y = L -1 Z'Y (A-12) 

where g is the column matrix of regression coefficients, and Y is the column 
matrix of the values of the dependent variable. L is a diagonal matrix with the 


eigenvalues of X^X on the diagonal. If all the components are retained In the 
model, transformation from g back to b will result In coefficients identical to 
those obtained using equation A-6. 

Components are deleted from the regression to overcome the effects of multi- 
coil in ear it ies. Then a least squares regression is performed on the remaining 
components. Two criteria are usually considered in deciding which components 
should be deleted. 

1. components associated with small eigenvalues are deleted. 

2. Components are deleted which are relatively unimportant as predictors 
of the dependent variable. 

Criterion 1 leads to deletion of variables which are relatively unimportant as 
predictors of the original independent variables. The remaining variables, which 
are important as predictors of the independent variables, are not necessarily 
highly correlated with the dependent variable. 


Latent Root Regression : 

Latent root regression examines the relations between the independent varia- 
bles and the dependent variable. Latent roots (eigenvalues) and latent vectors 
(eigenvectors) of the extended correlation matrix (including the independent aid 
dependent variables) are examined to 

1. Identify multicollinearities among the independent variables, and 

2. Determine whether the multicollinearities contribute to prediction of 
the dependent variable. 

Let A be the matrix of the independent and dependent variables. A' A is the 
extended correlation matrix. The jth eigenvalue of A - * A can be expressed 


X 


i 


n 

I 

i-1 


<J, i E 0J 


p 

+ I 
k“l 


*ikV 


(A-13) 


where = (Eqj , , ..., E^)' Is the jth eigenvector of A"A corresponding to 

A . Eqj is the component of in the direction of the dependent variable. If 
Aj * 0 for any value of j each term in equation (A-13) is equal to zero, and an exact 
linear relationship exists among some or all of the columns of A. If A.*0 and 
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Eqj » 0, equation (A-13) becomes 

1 iiAj " 0 for i-1 » 2 n * 

indicating an exact linear relationship ( mult icoll inear ity) among the columns of 
X, the matrix of the observed values of the independent variables. Small but non- 
zero eigenvalues of A'A indicate near singularities. From equation (A-13) 


y i E 0j + % 0 for 1 * l " 2 n 


(A-14) 


If Eqj , the component of the jth eigenvector in the direction of the dependent 
variable, is also small the relation becomes 


"iAj * °- 

The dependent variable is not involved, indicating that the multicollinearity is 
ncnpredictive . The variables involved have little or no effect on the dependent 
variable. Variables contributing large components to Eq^ can be examined to 
determine which should be eliminated from the regression. If X^ * 0 and F.^ is not 
small, then the multicollinearity is predictive: the dependent variable is in- 

cluded in the relationships indicated by the components of the eigenvector. 

Program BEIRA prints the components of the first five eigenvectors of A 'A 
and the corresponding eigenvalues. When it is found that an eigenvalue and the 
eigenvector component in the y direction are both small, the correction coefficients 
for the variables with large eigenvector components may be printed. Then, of two 
or more variables highly correlated with each other, the one least correlated with 
the dependent variable can be eliminated. An example is given in Appendix C. 


Ridge Regression 

If X^X has a nonuniform eigenvalue spectrum the regression coefficients cal- 
culated using ordinary least squares may be far removed from the true values (8). 
In ridge regression bias is introduced Into the ordinary least squares estimator 
to make a nonorthogonal data matrix act more like an orthogonal data matrix. The 
diagonal of the normal equations is augmented by a small positive quantity, which 
can prevent inflation of the regression coefficients. The ordinary ridge regres- 
sion estimator is given by 

b (k) = (X'X + kI) _1 X'Y, for k >- 0 (A-15) 




where X and Y are matrices of the standardized independent and dependent variables 
(equation (A-7)), X^X is the correlation matrix of the dependent variables and k is 
a small positive quantity. If k»0, equation (A-15) becomes the ordinary least 
squares estimator, given by equation (A-6) . 

A transformation from variable space to eigenvector space is accomplished by 
letting W»XP where P is the orthogonal matrix of the normalized eigenvectors. 

The model for the dependent variable is 

y - Wa + e 

where a*P'b and e is the random error, b is the matrix of regression coefficients 
of equation (A-6). The regression coefficient matrix becomes the generalized re- 
gression estimator 

a* - (W'W + K.) _1 W> (A-16) 

where K is a diagonal matrix of the eigenvalues of X'X. 

When all the k^’s in equation (A-16) are zero a* is the ordinary least squares 
estimator. Program BEIRA calculates the ordinary least squares regression coeffi- 
cients using equation (A-16) with K* 0. 

The program prints eigenvalues of U" X"X, (the matrix of correlation coeffi- 
cients of the independent variables), the value of the determinant of R, and the 
trace of R inverse, for use in evaluating the stability of the data matrix. The 
user may introduce bias into the regression by changing one of the eigenvalues of 
X'X, normally the smallest. The regression coefficients and the quantities indi- 
cating the stability of the data matrix are calculated using the new set of eigen- 
values. An example is given in Appendix C. 
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APPENDIX B 


The following 

• Table B-l: 

• Table B-2: 

• Figures B- 
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INDEPENDENT VARIABLES USED IN THE REGRESSIONS 


tables and figures are in Appendix B. 


Independent variables, listed by number 

Variables chosen by SELECT for slope-change 
years of the CCEA models 


l through B-14: 


Plots of R^j from SELECT, including subsets of 

variables for regressions for peak values 

of R 2 
adj 
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TABLE B-l INDEPENDENT VARIABLES USED TO DEVELOP YIELD MODELS 


SPRING WHEAT 

Precipitation in am 

1. January 

2 . February 

3. March 

4. April 

5. May 

6. June 

7 . July 

8. August 

Precipitation the year before, in am 

9. August 

10. September 

11. October 

12 . November 

13. December 

Average Temperature in degrees C 

14. April 

15. May 

16. June 

17 . July 

18. August 

19 . First Trend 

20. Second Trend 

21. Constant (for the independent model) 


WINTER WHEAT 

Precipitation in mm 

1 . January 

2 . February 

3 . March 

4. April 

5. May 

6 . June 

Precipitation the year before, in mm 

7. August 

8 . September 

9. October 

10 . November 

11. December 

Average Temperature in degrees C 

12 . January 

13 . February 

14. March 

15. April 

16. May 

17 . June 

P.E.T? 

18 . January 

19 . February 

20. March 

21. April 

22. May 

23. June 

24. First Trend 

25. Second Trend 

26. Constant (for the independent model) 


a For the Badlands and Montana, January and February PET were not given; for 
Nebraska, January PET was not given. Variables for these areas were 
renumbered omitting these quantities. 
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TABLE B-2: VARIABLES CHOSEN BY SELECT FOR SLOPE-CHANGE YEARS OF THE CCEA MODELS 
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Maximum R , , for subsets of variables from SELECT for models of 
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spring wheat yield in North Dakota. 
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spring wheat yield in the Red River Valley. 
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Maximum R . . for subsets of variables from SELECT for models of 
adj 

winter wheat yield in Kansas. 
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Figure B-10.- Maximum R for subsets of variables from SELECT for models of 

winter wheat yield in Nebraska. 
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Maximum R for subsets of variables from SELECT for models of 
adj 

winter wheat yield in Oklahoma. 
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Maximum R , for subtrees of variables from SELECT for models of 
adj 

winter wheat yield in the Texas Low Plains. 
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APPENDIX C 


RUNNING THE REGRESSIONS AND CHOOSING THE SUBSET OF 
INDEPENDENT VARIABLES CONSTITUTING THE BEST REGRESSIONS 


Regressions were run using program BEIRA with the independent variables 
chosen by SELECT as Input. When the LRR eigenvalues and eigenvectora Indicated 
the existence of a nonpredlctlve multicollinearlty the correlation coefficients 
were examined to determine the magnitude of the correlations between the Inde- 
pendent variables Involved, and between the Independent variables and the yield. 
When two or more variables were highly correlated (r > 0.85) the variable least 
correlated with the yield was eliminated. When the magnitudes of the correlation 
between yield and two variables highly correlated with each other were near.'y 
equal, the regression coefficients were examined to determine the variable con- 
tributing least to the dependent variable. When the regression and correlation 
coefficients indicated that different variables of a highly correlated pair should 
be deleted, regressions were run with each of the variables deleted. When the LRR 
eigenvectors and eigenvalues Indicated that any remaining multlcollinearitles wore 
predictive, the eigenvalues of X'X were examined. If they appeared not to increase 
uniformly, the first eigenvalue was increased by 0.1. 

The "best regression" for a region was chosen taking into account the values 
of R 2 , S 2 , the uniformity of the eigenvalues, the values of the determinant of R 
(the matrix of the correlation coefficients) and the trace of R- inverse, and the 
distribution of predicted values of yield above and below the measured values, 

a 

as indicated by the quantity A*y-y where y is the measured value of yield, and 

A 

y is the predicted value. 

The two regressions for Texas Low Plains illustrate the method for deciding 

which variables should be eliminated and for adding bias by changing the value of 

an eigenvalue. Variables chosen by SELECT for the peak values of R 2 are listed 

aa ] 

in Figure B-13, in Appendix B. 

Figure C-l shows the printout, starting with the LRR eigenvectors, fcr the 
dependent model with slope change in 1944 and the 11 independent variables chosen 
by SELECT. The values of the first four LRR eigenvalues are small. The components 
of the first three LRR eigenvalues in the direction of th^ dependent variable 
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(yield, variable 27) are also small. The component of the fourth eigenvector, 

24, in the yield direction is large (0.718974) indicating that the eigenvector is 
predictive. Large components of the first three eigenvectors are underlined, and 
in the next step in the program the correlation coefficients are asked for. 

Variables 17 and 23, 14 and 20, and 12 and 18 are found to be highly correlated. 
Relations between the correlation coefficients for these variables and the dependent 
variable, yield, in the last column (headed 27) are examined. The following rela- 
tions are found: 


17,y^ lr 23,yi 

with 

|r l7,yl 

slightly smaller 

I4,yl^" r 20,yl 

with 

' r 20,y' 

somewhat 

smaller 

12,y^ r l8,y! 

with 

' r 12,y' 

somewhat 

smaller. 


The rule that the independent variables to be deleted are those least corre- 
lated with the dependent variable leads to deletion of variables 17, 20 and 12. 
However, examination of the regression coefficients (the beta vector) shows that 

1^17 l > i®23 I * I ^14 I > I ^20 ^ * and ^12^^18^’ indicatin 8 that variables 17 and 12 
have a greater influence on the dependent variables than do 23 and 18. 

Figure C-2 is the printout for the regression with variables 17, 20, and 12 
deleted. In Figure C-3 variables 23, 20, and 18 are deleted. For both these re- 
gressions the first LRR eigenvalue is small and the y-component of the first eigen- 
vector is large, indicating prediction. The eigenvalues for both regressions are 
uniform; and the values of determinant of R and trace of R-inverse for the two 
regressions are nearly equal. The program calculates the predicted value of yield 
for each year, and subtracts the measured yield from the predicted yield. For 
both regressions there were 25, out of 45, values of predicted yield less than the 
measured yield, indicated by 25 A < 0. The regression of Figure C-3 was chosen as 
the best regression because of the somewhat higher R 2 and smaller . S 2 . 

Figures C-4, C-5, and C-6 show printouts for the independent model with slope 
change in 1956, with all the variables chosen by SELECT, with variable 12 deleted 
and with variable 18 deleted. The LRR eigenvectors and eigenvalues in Figure C-4 
indicate that variables 12 and 18 constitute a nonpredictive multicollinearity. 

The correlation coefficients for these two variables and yield (variable 27) are 
nearly equal, with that for variable 12 and yield being somewhat smaller. 
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Figure C-2.- Regression 
in 1948, variables 


for Texas Low Plains: Dependent Model, slope change 

17, 20, 12 deleted from the set chosen by SELECT 
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Figure C-4.- Regression for Texas Low Plains: Independent Model, slope 

change in 1956, 11 variables chosen by SELECT. 
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Figure 05.- Regression for Texas Low Plains: Independent Model, slope 

change in 1956, variable is deleted from the set chosen by SELECT. 
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Figure C-6.- Regression for Texas Low Plains: Independent Model, slope 

change in 1956, variable 18 deleted from the set chosen by SFLECT. 



Comparison of the regression coefficients in Figure C-4 Indicates variable 12 makes 
a larger contribution to yield than variable 18. Figures C-5 and C-6 show somewhat 
improved values for the determinant of R and trace of R-lnverse, and for R 2 and S 2 , 
when variable 18 is deleted. Examination of the eigenvalues in Figure C-6 indicates 
that increasing the value of the first one by 0.1 would make them more uniform. 

The result is shown in Figure C-7. The values of the determinant of R and trace of 
R-inverse are improved, the values of R 2 and S 2 are nearly the same as before, and 
the number of predicted yield values greater than the measured yield is nearly the 
same as the number less than the measured yield (23 A <0 out of 45 observations). 
This regression was chosen as the hest regression. 

A number of the subsets of independent variables chosen by SELECT did not in- 
clude either one or both of the trend variables, or the constant variable (for the 
independent model). Several regressions were run with the omitted trend or con- 
stant variable added to those chosen by SELECT. It was found that the values of 
R 2 remained almost the same and the values of S 2 increased somewhat. The distribu- 
tion of predicted yield values above and below the measured values was more uniform 
without the added variables, and the values of determinant of R and trace of R- 
inverse indicated more stability in the data matrices for the subsets as chosen 
by SELECT. 
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Figure C-7.- Regression for Texas Low Plains: First eigenvalue 

of Figure C-6 increased by 0.1, 
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Figure D-l(b) .- Tread lines sod predicted end Measured yields for bei 
regressions froM SELBCT-BEIBA for Minnesota, 1954 Independent Model 
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Figure D-2(b).- Trend lines and predicted and Measured yields for beat 
regression f roe SELECT- BE 1KA for Montana spring wheat, 

1943 Independent nodal. 
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Figure 0-3(a) .- ''rend lines end predicted end aeasured yields for best 
regression from J3LP.CT-BEIBA for North Dakota, 1953 Dependent Model. 
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Figure D— 3(b) .— Tread lines and predicted and aeasured yields for beat 
regression fron SELECT-BEIRA for North Dakota, 1961 Independent eodel. 
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Figure D-4(e) .- Tread lines end predicted end nessured yields for best 
regression frcm SELECT- BE IRA for Red River Velley, 

1951 Dependent Model. 
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Figure D-S(a)*- Trend lines and predicted and iaeasured yields for best 
regression from SELECT-BEIRA for South Dakota, 1953 Dependent aodel. 
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Figure D-S(b).- Tread lines and predicted and Measured yields for besi 
regression from SELECT-BEIRA for South Dakota, 1966 Independent aodel 
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Figure D-6(b).- Trend lines and predicted and Measured yields for best 
regression froa SELECT-BEIRA for Badlands, 1975 Independent Model. 
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1955 Dependent, 12 variables; 20,4 deleted 
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19U6 Independent, 1» variables; 15,20 deleted 
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Figure D-9(a).- Trend lines end predicted and Measured fields for best 
regression frc* S ELECT- BEIRA for Hon tana winter wheat, 

1934 Dependent Model. 
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Figure D-9(b).- Trend lines and predicted and aeasured yields for best 
regression froa SELECT-BEIRA for Montana winter wheat, 

1944 Independent aodel. 
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Figure D-10(b).- Trend lines and predicted and measured yields for best 
regression from SELECT-BEIRA for Nebraska, 1969 Independent model. 
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GOOOMFSS Of FIT IS 0. 94.5863 

Figure D-ll(b).- Trend lines and predicted and Measured yields for best 
regression from SELECT-BEIRA for Oklahoaa, 1958 Independent Model. 
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Figure D-12(a).- Tread lines and predicted and Measured yields for best 
regression f roa SELECT-BEIKA for Texas Edwsrds Plateau, 

1949 Dependent Model. 
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Figure D-12(b).- Trend lines and predicted and aeasured yields for best 
regression froa StLECT-BEIRA for Texas Edwards Plateau, 

1957 Independent aodel. 
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n.Mrth 

Figure D-13(a).- Trend line* and predicted and Measured yields for best 
regression froM SELECT-BEIRA for Texas Low Plains, 

1956 Independent Model. 
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APPENDIX E 

CENTER FOR CLIMATIC AW *I,'v I RONMENTAL ASSESSMENT (CCEA) MODELS 


APPENDIX E 


CENTER FOR CLIMATIC /*> ENVIRONMENTAL ASSESSMENT* (CCEA) ML J ELS 
The following Ceblee end figures ere in Appendix E. 

• Regreeelon Coefficients 

Teble E-l: The Center for Cline tic anr* Envlronnental Aaeeeenent Models 

(a) spring wheat 

(b) winter wheat 

• Plots shoving measure yields and trend lines 
Figures E-l through E-12 


*Thi« la a division of the U.S. Departnent of Agrlwalture. 
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(a) Spring wheat 
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(b) Winter vheat 
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DFN • Departure froa nonaal. 

PET • Potential evapot ranapt rat Ion 
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DPH - Departure froa normal. 

PET - Potential evapotranaplratlon 
















Precipitation (OPN) 
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DPN • Departure froa normal. 

PET “ Potential evapotraneplratlon 













(b) Winter wheat 
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DM • Departure froa normal. 

per - Potential evapotranaplratlon 
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Figure E-l Trend lines for the CCEA models for spring wheat 

yield In Minnesota. 



Figure E-2.- Trend lines for the CCEA models for spring wheat 

yield In Montana. 


E-9 



ORIGINAL PAGE IS 
OF POOR QUALITY 



Figure E-3 Tread lines for CCEA models for spring wheat 
yield in North Dakota. 



Figure E-4.- Trend lines for CCEA models for spring wheat 
yield in South Dakota. 
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Figure E-7 Trend lines for CCEA models for winter wheat 

yield in Montana. 
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Figure F-8.- Trend lines for CCEA models for winter wheat 
yield in Nebraska. 
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Figure E-9.- Trend lines for CCEA models for vinCer vhesc 
yield in Oklahoma. 



Figure E-10.- Trendlines for CCEA models for winter wheat 
yield in the Texas Edwards Plateau. 
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Figure E— 1 1 Tread llaee for CCEA model* for wlater wheat 
modela In Che Texas Low Plains. 
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. j.gure E-12.- Trend lines for CCEA models for winter wheat 
yield in Che Texas -Oklahoma Panhandle. 
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